biological signal
Detecting Batch Heterogeneity via Likelihood Clustering
Batch effects represent a major confounder in genomic diagnostics. In copy number variant (CNV) detection from NGS, many algorithms compare read depth between test samples and a reference sample, assuming they are process-matched. When this assumption is violated, with causes ranging from reagent lot changes to multi-site processing, the reference becomes inappropriate, introducing false CNV calls or masking true pathogenic variants. Detecting such heterogeneity before downstream analysis is critical for reliable clinical interpretation. Existing batch effect detection methods either cluster samples based on raw features, risking conflation of biological signal with technical variation, or require known batch labels that are frequently unavailable. We introduce a method that addresses both limitations by clustering samples according to their Bayesian model evidence. The central insight is that evidence quantifies compatibility between data and model assumptions, technical artifacts violate assumptions and reduce evidence, whereas biological variation, including CNV status, is anticipated by the model and yields high evidence. This asymmetry provides a discriminative signal that separates batch effects from biology. We formalize heterogeneity detection as a likelihood ratio test for mixture structure in evidence space, using parametric bootstrap calibration to ensure conservative false positive rates. We validate our approach on synthetic data demonstrating proper Type I error control, three clinical targeted sequencing panels (liquid biopsy, BRCA, and thalassemia) exhibiting distinct batch effect mechanisms, and mouse electrophysiology recordings demonstrating cross-modality generalization. Our method achieves superior clustering accuracy compared to standard correlation-based and dimensionality-reduction approaches while maintaining the conservativeness required for clinical usage.
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience
Integrating data from multiple experiments is common practice in systems neuroscience but it requires inter-experimental variability to be negligible compared to the biological signal of interest. This requirement is rarely fulfilled; systematic changes between experiments can drastically affect the outcome of complex analysis pipelines. Modern machine learning approaches designed to adapt models across multiple data domains offer flexible ways of removing inter-experimental variability where classical statistical methods often fail. While applications of these methods have been mostly limited to single-cell genomics, in this work, we develop a theoretical framework for domain adaptation in systems neuroscience. We implement this in an adversarial optimization scheme that removes inter-experimental variability while preserving the biological signal.
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience
Integrating data from multiple experiments is common practice in systems neuroscience but it requires inter-experimental variability to be negligible compared to the biological signal of interest. This requirement is rarely fulfilled; systematic changes between experiments can drastically affect the outcome of complex analysis pipelines. Modern machine learning approaches designed to adapt models across multiple data domains offer flexible ways of removing inter-experimental variability where classical statistical methods often fail. While applications of these methods have been mostly limited to single-cell genomics, in this work, we develop a theoretical framework for domain adaptation in systems neuroscience. We implement this in an adversarial optimization scheme that removes inter-experimental variability while preserving the biological signal.
Incorporating intratumoral heterogeneity into weakly-supervised deep learning models via variance pooling
Carmichael, Iain, Song, Andrew H., Chen, Richard J., Williamson, Drew F. K., Chen, Tiffany Y., Mahmood, Faisal
Supervised learning tasks such as cancer survival prediction from gigapixel whole slide images (WSIs) are a critical challenge in computational pathology that requires modeling complex features of the tumor microenvironment. These learning tasks are often solved with deep multi-instance learning (MIL) models that do not explicitly capture intratumoral heterogeneity. We develop a novel variance pooling architecture that enables a MIL model to incorporate intratumoral heterogeneity into its predictions. Two interpretability tools based on "representative patches" are illustrated to probe the biological signals captured by these models. An empirical study with 4,479 gigapixel WSIs from the Cancer Genome Atlas shows that adding variance pooling onto MIL frameworks improves survival prediction performance for five cancer types.
Physiological signals could be the key to 'emotionally intelligent' AI, scientists say: Researchers integrate biological signals with gold-standard machine learning methods to enable emotionally intelligent speech dialog systems
"Multimodal sentiment analysis" is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person's psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities, which understands the user's sentiment and generates a response accordingly. However, current emotion estimation methods focus only on observable information and do not account for the information contained in unobservable signals, such as physiological signals. Such signals are a potential gold mine of emotions that could improve the sentiment estimation performance tremendously.
Machine Learning Researchers Spot Deep Fakes From Heartbeats
"Anatomical actions such as heartbeat, blood flow, or breathing, create subtle changes that are not visible to the eye but still detectable computationally." A mere suspicion of doctored video can topple governments. It happened in Gabon, when the President recorded a video message that was awkwardly shot. The inconsistency was mistaken for being a product of deep fake and that led to a coup as the opposition assumed the death of the President. Though AI-generated imagery has great potential, malicious usage is equally as damning.
AI researchers use heartbeat detection to identify deepfake videos
Facebook and Twitter earlier this week took down social media accounts associated with the Internet Research Agency, the Russian troll farm that interfered in the U.S. presidential election four years ago, that had been spreading misinformation to up to 126 million Facebook users. Today, Facebook rolled out measures aimed at curbing disinformation ahead of Election Day in November. Deepfakes can make epic memes or put Nicholas Cage in every movie, but they can also undermine elections. As threats of election interference mount, two teams of AI researchers have recently introduced novel approaches to identifying deepfakes by watching for evidence of heartbeats. Existing deepfake detection models focus on traditional media forensics methods, like tracking unnatural eyelid movements or distortions at the edge of the face.
Leveraging AI to Cure Cancer
Cancer is one of the leading causes of death worldwide. Not to mention 1 in 4 Canadians are predicted to actually die of cancer during their lifetime, that's 9,250,000 people. Recently, many advances in technologies such as Artificial Intelligence are helping researchers revolutionize the future of healthcare, from identifying patterns in medical images to predicting new target proteins for drugs! This technology is showing significant ability to change the lives of millions around the world. I leveraged the technology of AI and delved deep into the subsection called Generative Models.